Getting Started

For many of the basics in RMarkdown the RStudio introduction is very worth while. Almost every R visualization output is compatible with it.

Common Needs

Database Connections

require(DBI)
wh_host <- Sys.getenv('WH_HOST')
wh_db <- Sys.getenv('WH_DB')
wh_user <- Sys.getenv('WH_USER')
wh_pass <- Sys.getenv('WH_PASS')

# Connect to the CountyStat DataWarehouse
wh_con <- dbConnect(odbc::odbc(), driver = "{ODBC Driver 17 for SQL Server}", server = wh_host, database = wh_db, UID = wh_user, pwd = wh_pass, Trusted_Connection= "yes")

Always keep your credentials for the DataWarehouse in an .env file. An example is in this repo, but in normal circumstances this file should be in the .gitignore file.

To use DBI check out their documentation, but for the most part you will be using either the dbGetQuery() or dbReadTable() function.

Data Visualizations

The easiest package for quickly creating charts and graphs in R is ggplot2. Once of the best resources for ggplot2 visualizations is the Topic 50 ggplot2 Visualizations from Selva Prabhakaran. This includes full code and examples using you can access directly from R. This site also has a good introduction to ggplot2 if you find the one from RStudio insufficient.

require(ggplot2)

g <- ggplot(mpg, aes(manufacturer)) + geom_bar(aes(fill=class), width = 0.5) + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  labs(title="Histogram on Categorical Variable", 
       subtitle="Manufacturer across Vehicle Classes",
       caption = "EPA Fuel Economy 1999-2008") 

g

If you are using this document as your template, that means you can create interactive visuals. The easiest way to do this with ggplot is using the ggplotly function from the plotly library.

require(plotly)

ggplotly(g) %>%
  layout(annotations = list(x = 1,
                            y = -0.28, 
                            text = 'EPA Fuel Economy 1999-2008', 
                            showarrow = F,
                            xref='paper',
                            yref='paper', 
                            xanchor='right', 
                            yanchor='auto', 
                            xshift=0, 
                            yshift=0,
                            font=list(size=12, color = "darkgray")),
         title = list(text = paste0('Histogram on Categorical Variable',
                                    '<br>',
                                    '<sup>',
                                    'Manufacturer across Vehicle Classes',
                                    '</sup>')))

Note that not all of the labels translate, subtitles and captions are lost and have to be added through other methods.

Colors

A great resource for color palettes is colorbrewer, which is easy to implement in ggplot2.

g + scale_fill_brewer(palette = "Spectral")

Tables

The best functions for tables are DT() and kable(). The best option will depend on the number of rows and situation you find yourself in. Both of these functions allow for a wide amount of customization and formating. CountyStat has a standard table format to start with, however Analysts should feel free to check out the documentation for both packages especially when looking to format the rows to highlight certain values or increase readability, or improve other functions.

DT

DT has extensive documentation showing its features. Below is the standard format for a CountyStat DT table. One of the biggest benefits to this package is the horizontal scroll option and download button feature.

require(DT)

datatable(iris, 
          caption = "Fisher's or Anderson's", 
          rownames = F,
          escape = F,
          extensions = 'Buttons', 
          options = list(
            dom = 'Bfrtip',
            buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
            # list(scrollX = TRUE),
            initComplete = JS(
              "function(settings, json) {",
              "$(this.api().table().header()).css({'background-color': '#008080', 'color': '#fff'});",
              "}")
            )
          )

kable w/ kableExtra

Use the kableExtra package to format your kable() tables. Like the DT() example above a standard CountyStat format is available below, but Analysts should feel free to use the wide variety of features the kableExtra library provides to improve readability and convey a message.

require(kableExtra)

iris %>%
  sample_n(15) %>% # Select 15 random rows
  kbl(caption = "Fisher's or Anderson's") %>%
  kable_styling(font_size = 12) %>%
  row_spec(0, bold = T, background = "#008080", color = "white")
Fisher’s or Anderson’s
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.8 3.0 1.4 0.3 setosa
5.1 3.4 1.5 0.2 setosa
5.5 2.6 4.4 1.2 versicolor
6.7 3.3 5.7 2.5 virginica
4.9 3.6 1.4 0.1 setosa
6.0 3.0 4.8 1.8 virginica
6.1 2.9 4.7 1.4 versicolor
5.2 2.7 3.9 1.4 versicolor
6.7 3.1 4.7 1.5 versicolor
7.3 2.9 6.3 1.8 virginica
4.8 3.0 1.4 0.1 setosa
6.6 2.9 4.6 1.3 versicolor
4.4 3.0 1.3 0.2 setosa
7.7 3.8 6.7 2.2 virginica
5.1 3.8 1.6 0.2 setosa